39 research outputs found

    HybridMiner: Mining Maximal Frequent Itemsets Using Hybrid Database Representation Approach

    Full text link
    In this paper we present a novel hybrid (arraybased layout and vertical bitmap layout) database representation approach for mining complete Maximal Frequent Itemset (MFI) on sparse and large datasets. Our work is novel in terms of scalability, item search order and two horizontal and vertical projection techniques. We also present a maximal algorithm using this hybrid database representation approach. Different experimental results on real and sparse benchmark datasets show that our approach is better than previous state of art maximal algorithms.Comment: 8 Pages In the proceedings of 9th IEEE-INMIC 2005, Karachi, Pakistan, 200

    FastLMFI: An Efficient Approach for Local Maximal Patterns Propagation and Maximal Patterns Superset Checking

    Full text link
    Maximal frequent patterns superset checking plays an important role in the efficient mining of complete Maximal Frequent Itemsets (MFI) and maximal search space pruning. In this paper we present a new indexing approach, FastLMFI for local maximal frequent patterns (itemset) propagation and maximal patterns superset checking. Experimental results on different sparse and dense datasets show that our work is better than the previous well known progressive focusing technique. We have also integrated our superset checking approach with an existing state of the art maximal itemsets algorithm Mafia, and compare our results with current best maximal itemsets algorithms afopt-max and FP (zhu)-max. Our results outperform afopt-max and FP (zhu)-max on dense (chess and mushroom) datasets on almost all support thresholds, which shows the effectiveness of our approach.Comment: 8 Pages, In the proceedings of 4th ACS/IEEE International Conference on Computer Systems and Applications 2006, March 8, 2006, Dubai/Sharjah, UAE, 2006, Page(s) 452-45

    Seasonal to Inter-annual Climate Prediction Using Data Mining KNN TYechnique”,

    Get PDF
    Abstract. The impact of seasonal to inter-annual climate prediction on society, business, agriculture and almost all aspects of human life, force the scientist to give proper attention to the matter. The last few years show tremendous achievements in this field. All systems and techniques developed so far, use the Sea Surface Temperature (SST) as the main factor, among other seasonal climatic attributes. Statistical and mathematical models are then used for further climate predictions. In this paper, we develop a system that uses the historical weather data of a region (rain, wind speed, dew point, temperature, etc.), and apply the data-mining algorithm "K-Nearest Neighbor (KNN)" for classification of these historical data into a specific time span. The k nearest time spans (k nearest neighbors) are then taken to predict the weather. Our experiments show that the system generates accurate results within reasonable time for months in advance

    Prevalence of ultrasonography proved polycystic ovaries in North Indian women with type 2 diabetes mellitus

    Get PDF
    BACKGROUND: Polycystic ovaries (PCO) and their clinical expression (the polycystic ovary syndrome [PCOS]) as well as type 2 diabetes mellitus (T2DM) are common medical conditions linked through insulin resistance. We studied the prevalence of PCO and PCOS in women with diet and/or oral hypoglycemic treated T2DM and non-diabetic control women. DESIGN: Prospective study. METHODS: One hundred and five reproductive age group women with diet and /or oral hypoglycemic treated T2DM were the subjects of the study. Sixty age-matched non-diabetic women served as controls. Transabdominal ultrasonographic assessment of the ovaries was used to diagnose PCO. Clinical, biochemical and hormonal parameters were also noted. RESULTS: Ultrasonographic prevalence of PCO was higher in women with diabetes than in non-diabetic subjects (61.0% vs. 36.7%, P < 0.003) whereas that of PCOS was 37.1% in diabetic subjects and 25% in non-diabetic controls (P > 0.1). Diabetic women with PCO had diabetes of significantly longer duration than those without PCO (4.19±2.0 versus 2.9±1.6 yrs; p < 0.05). Among both diabetic and non-diabetic women, those with PCO had significantly higher plasma LH, LH/FSH ratio, total testosterone and androstenedione levels. CONCLUSION: This study demonstrates a higher prevalence of PCO in women with T2DM as compared to non-diabetic subjects

    Broken link repairing system for constructing contextual information portals

    No full text
    The web is an extremely powerful resource that has the potential to improve education and health. It enables access to new markets. There are, however, fundamental problems with web access in emerging regions. The primary issue is that internet connectivity is not keeping up with web complexity and size. Recently an innovative technology is developed in the form of contextual information portals (CIP) to mitigate the effect of low connectivity. CIP provides offline searchable and browse-able information portal. The information in CIP is composed of vertical slices of the internet about specific topics. CIP is an ideal tool for developing regions which have limited access to internet. It can be used in schools and colleges to enhance lesson plans and educational material. Although, as a standalone portal CIP provides an interactive searching and browsing interface enabling a web-like experience, however, a fundamental problem that users face is broken links. This is because crawling the web for constructing a collection for CIP only makes available a portion of webpages but not all possible documents. This creates several broken links. To address this problem we develop a broken link repairing system (brLinkRepair) for repairing broken links. brLinkRepair is useful when a user tries to navigate between pages through links and pointed pages of links are missing from the CIP. We provide an information retrieval system for repairing broken links. For each broken link our system recommends related pages that are similar to pointed pages. To further improve the effectiveness of system we combine all information sources using learning to rank approach. Our results indicate learning to rank (by combining information sources) improves effectiveness. Keywords: Information retrieval, Machine learning, Broken links, Learning to rank, Contextual information portals for intermittent network

    An Improved Retrievability-Based Cluster-Resampling Approach for Pseudo Relevance Feedback

    No full text
    Cluster-based pseudo-relevance feedback (PRF) is an effective approach for searching relevant documents for relevance feedback. Standard approach constructs clusters for PRF only on the basis of high similarity between retrieved documents. The standard approach works quite well if the retrieval bias of the retrieval model does not create any effect on the retrievability of documents. In our experiments we observed when a collection contains retrieval bias, then high retrievable documents of clusters are frequently retrieved at top positions for most of the queries, and these drift the relevance feedback away from relevant documents. For reducing (retrieval bias) noise, we enhance the standard cluster construction approach by constructing clusters on the basis of high similarity and retrievability. We call this retrievability and cluster-based PRF. This enhanced approach keeps only those documents in the clusters that are not frequently retrieve due to retrieval bias. Although this approach improves the effectiveness, however, it penalizes high retrievable documents even if these documents are most relevant to the clusters. To handle this problem, in a second approach, we extend the basic retrievability concept by mining frequent neighbors of the clusters. The frequent neighbors approach keeps only those documents in the clusters that are frequently retrieved with other neighbors of clusters and infrequently retrieved with those documents that are not part of the clusters. Experimental results show that two proposed extensions are helpful for identifying relevant documents for relevance feedback and increasing the effectiveness of queries
    corecore